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Abstract 

In this paper, we propose a fast distributed solver for linear equations given by symmetric diagonally dominant 
M-Matrices. Our approach is based on a distributed implementation of the parallel solver of Spielman and 
Peng by considering a specific approximated inverse chain which can be computed efficiently in a distributed 
fashion. Representing the system of equations by a graph G, the proposed distributed algorithm is capable of 
attaining e-close solutions (for arbitrary e) in time propotional to (number of nodes in G), a (upper bound 
on the size of the R-Hop neighborhood), and (maximum and minimum weights of edges in G). 
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1. Introduction 


Solving systems of linear equations in symmetric diagonally matrices (SDD) is of interest to researchers in 
a variety of fields including but not limited to, solutions to partial differential equations ITJ , computations of 
maximum flows in graphs , machine learning [ 2 ^ , and as basis for various algorithms U 

Much interest has been devoted to determining fast algorithms for solving SDD systems. Spielman and 
Teng proposed a nearly linear-time alrorithm for solving SDD systems, which benefited from the multi¬ 
level framework of [H, preconditioners @ 1 , and spectral graph sparsifiers iil. Further exploiting these 
ingredients, Koutis et. al [H, [l^ developed an even faster algorithm for ac^iring e-close solutions to SDD 
linear systems. Further improvements have been discovered by Kelner et. al |12l |. where their algorithm relied 
on only spanning-trees and eliminated the need for graph sparsifiers and the multi-level framework. 

On the parallel side, much progress has been made on developing such solvers. Koutis and Miller 1^ 


proposed an algorithm requiring nearly-linear work and depth for planar graphs. This was then extended 
to general graphs by leading to depth close to Peng and Spielman [ 2 ^ have proposed an efficient parallel 
solver requiring nearly-linear work and poly-logarithmic depth without the need for low-stretch spanning trees. 
Their algorithm, which we distribute in this paper, requires sparse approximate inverse chains which 
facilitates the solution of the SDD system. 

Less progress, on the other hand, has been made on the distributed version of these solvers. Current 
methods, e.g., Jacobi iteration , can be used for distributed solutions but require substantial complexity. 


In [17|, the authors propose a gossiping framework which can be used for a distributed solution of the above 


linear system. Recent work [1^ considers a local and asynchronous solution for solving systems of linear 
equations, where they acquire a bound on the number of needed multiplication proportional to the degree and 
condition number for one component of the solution vector. 

Contributions: In this paper, we propose a fast distributed solver for linear equations given by symmetric 
diagonally dominant M-Matrices. Our approach distributes the parallel solver in by considering a spe¬ 
cific approximated inverse chain which can be computed efficiently in a distributed fashion. Our algorithm’s 


^ Wmax 


R Wn, 


log ( 7 )^, with n being the number of nodes in graph G, 


computational complexity is given by O 
LFmax and Wmin denoting the largest and smaller weights of the edges in G, respectively, a = min < n, — t- > 

I ^^max J- J 

representing the upper bound on the size of the R-Hop neighborhood Vv G V, and e G (0, being the precision 
parameter. Our approach improves current linear methods by a factor of log n and by a factor of the degree 
compared to 0 for each component of the solution vector. 


2. Problem Definition & Notation 

We consider the following system of linear equations: 

Mqx = bo (I) 

where IUq is a Symmetric Diagonally Dominant M-Matrix (SDDM). Namely, Mg is symmetric positive definite 
with non-positive off diagonal elements, such that for alH = 1,2,..., n: 

n 

[MoU>- 

The system of Equations in [1] can be interpreted as representing an undirected weighted graph, G, with Mq 
being its Laplacian. Namely, G = (V, E, IK), with V representing the set of nodes, E denoting the edges, 
and W representing the weighted graph adjacency. Nodes Vi and vj are connected with an edge e = {i,j) iff 
> 0, where: 

Wij = [Mo]ii (if i = j), or Wij = - [Mo]ij , otherwise. 

Following [ 2 ^, we seek e-approximate solutions to x*, being the exact solution of Mgx = bo, defined as: 

Definition 1. Let x* G M”' be the solution of Mx = bo- A vector s £ M” is called an e— approximate 
solution, if: 

\\x* - x\\mq<A\x*\\mo^ where ||n||^^ = n'^Mon. (2) 
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The R-hop neighbourhood of node Vk is dehned as N,. (vk) = {n G V : dist {vk,v) < r}. We also make use 
of the diameter of a graph, G, defined as diam (G) = max^.^^^gv dist (uj, Vj). 


Definition 2. We say that a matrix A G has a sparsity pattern corresponding to the R-hop neighborhood 
if Aij = 0 for alH = 1,..., n and for all j such that vj ^ {vi). 


We will denote the spectral radius of a matrix A by p{A) = max|Aj|, where A* represents an eigenvalue 
of the matrix A. Furthermore, we will make use of the condition numbeilil, k (A) of a matrix A defined as 


K = 


k(A) 


fn [21l | it is shown that the condition number of the graph Laplacian is at most O (n 


,3 Wrr, 


Wr„ 


where Wmax and Wmin represent the largest and the smallest edge weights in G. Finally, the condition number 
of a sub-matrix of the Laplacian is at most O I, see 


2.1. Problem Definition 

We assume that each node G V has information about the weights of adjacent edges. Further, each 
node Vk has the capabilities of storing the value of the component of bo, which is denoted as At each 
time step, nodes can exchange information with their neighbours. Each node is responsible for determining the 
corresponding component, Xi, of the solution vector x G M”. We also assume a synchronized model whereby 
time complexity is measured by a global clock. The goal is to find e-approximate solution for MqX = bo in a 
distributed fashion, while being restricted to R-hop communication between the nodes. 


3. Background 

3.1. Standard Splittings & Approximations 

Following the setup in [20(], we provide standard definitions required in the remainder of the paper: 

Definition 3. The standard splitting of a symmetric matrix Mq is; 

ATo = Dq — Aq (3) 

here Dq is a diagonal matrix such that [Dq\ - = [Mq] - for z = 1, 2,..., n, and Aq representing a non-negative 

symmetric matrix such that = — [Mo]ij if z / j, and [Aq\ - = 0. 

We also define the Loewner ordering: 

Definition 4. Let be the space of n x n-symmetric matrices. The Loewner ordering ^ is a partial order 
on such that Y X A and only if X — "E is positive semidefinite. 

Finally, we define the “~q” operation used in the sequel to come as: 

Definition 5. Let X and Y be positive semidefinite symmetric matrices. Then X ~q, Y if and only iff 

e"“X fi^Y e“X (4) 

with A < B meaning B — A is positive semidefinite. 

Based on the above definitions, the following lemma represents the basic characteristics of the operator: 
Lemma 1. Let X,Y, Z and, Q be symmetric positive semi definite matrices. Then 

(1) If X Y, then X + Z Y + Z, (2) If X Y and Z Q, then X + Z Y + Q 

(3) If X Y and Z Q, then X + Z Y + Q, (4) If X Y and Y ^^2 then X ’^ai+a 2 ^ 
(5) If X, and Y are non singular and X ~q, Y, then X~^ Y~^, (6) If X ~q, Y and V is a matrix, 

then V'^XV V'^YV 

The next lemma shows that good approximations of Mq^ guarantee good approximated solutions of 
MqX = bo. 

Lemma 2. Let Zq dM^^ , and x = ZobQ. Then x is — 1) approximate solution of Mqx = foo- 

Proof. The proof can be found in the appendix. □ 

We next discuss the parallel SDDM solver introduced in (^ . 


^Please note that in the case of the graph Laplacian, the condition number is defined as the ratio of the largest to the smallest 
nonzero eigenvalues. 
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3.2. The Parallel SDDM Solver 

The parallel SDDM solver proposed in [20| is a parallelized technique for solving the problem of Section [2.11 
It makes use of inverse approximated chains (see Definition [ 6 |) to determine x and can be split in two steps. 
In the first step, denoted as Algorithm [H a “crude” approximation, xq, of x is returned, xq is driven to the 
e-close solution, x, using Richardson Preconditioning in Algorithm [2j Before we proceed, we start with the 
following two Lemmas which enable the definition of inverse chain approximation. 

Lemma 3. ^3/ If M = D — A is an SDDM matrix, with D being positive diagonal, and A denoting a 

non-negative symmetric matrix, then D — AD~^A is also SDDM. 

Lemma 4. Let M = D — A he an SDDM matrix, where D is positive diagonal and, A a symmetric 

matrix. Then 


{D - A)-^ = ^ D-^ + {I + D-^A) {D - AD-^A) ^ (/ + 

Given the results in Lemmas [3] and 01 we now can consider inverse approximated chains of ALq: 


( 5 ) 


Definition 6. Let C = {ALq, Afi,..., be a collection of SDDM matrices such that Mi = Di — Ai, 
with Di a positive diagonal matrix, and Aj denoting a non-negative symmetric matrix. Then C is an inverse 
approximated chain if there exists positive real numbers eo, ei,..., such that: (1) For i = 1,... ,d: Di — 
-Aj ~ei_i T)i—\ Aj_i, (2) For i — 1,... ,d. Di Di—\, and (3) D^^ -^d- 


Algorithm 1 ParallelRSolve {Mq, Mi, ..., M^, bo) 

1 : Input: Inverse approximated chain, {Mq,M\, ..., Af^}, and bo being 
2 : Output: The “crude” approximation, xq, of x* 

3: for i = 1 to d do 
4: bj = (J -|- Aj_iD^_^) bi_i 

end for 
5: Xd = D^^bd 
6: for z = d — 1 to 0 do 
7: Xi = I + {I + Df^Ai) Xi+i] 

end for 
8 : return xq 


The quality of the “crude” solution returned by Algorithm [T] is quantified in the following lemma: 

Lemma 5. fMj Let {Mq, Mi,..., Md} be the inverse approximated chain and denote Zq be the operator 
defined by ParallelRSolve {Mq, Mi, ..., Md, bo), namely, xq = ZQbQ. Then 


Zq 


E d 

i= 0 ' 


M, 


-1 


( 6 ) 


Algorithm [T] returns a “crude” solution to Mqx = b. To obtain arbitrary close solutions, Spielman et.al [20(] 
introduced the preconditioned Richardson iterative scheme, summarized in Algorithm 0J Following their anal¬ 
ysis, Lemma [ 6 ] provides the iteration count needed by Algorithm [2] to arrive at x. 


Lemma 6 . fEdlJ Let {Afo, Afi ... Af^} be an inverse approximated chain such that < 3 In 2. Then 

ParallelESolve {Mq, Mi, ..., Md, bo, e) arrives at an e close solution of x* in q = O (log i) iterations. 


4. Distributed SDDM Solvers 

Next, we distribute the parallel solver of Section 13.21 Similar to [lO], we first introduce an approximate 
inverse chain which can be computed in a distributed fashion. This leads us to distributed version of the 
“crude” solver (see Section ill). Contrary to [ 2 ^, however, we then generalize the “crude” distributed solver 
to allow for exact solutions (see Section 02]) of Equation [H We summarize our results in the following theorem: 

Theorem 1 . There exists a distributed algorithm, A ({[Afojfci,... [Afo]fcn}; [bo]fc, e), that computes e-close ap¬ 
proximations to the solution of Mqx = bo in O (re^log/tlog (i)) time steps, with n the number of nodes in G, 
K the condition number of Mq, and [Afojfc. the row of Mq, as well as e ^ (O, representing the precision 
parameter. 

Note that for each node G V, the input information for algorithm A is the row of Mq (i.e., the 
weights of the edges adjacent to Vi), the precision parameter e, and the kf^ component of [bo] (i.e., [bo]fc), 
easily rendering a distributed solver. 
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Algorithm 2 ParallelESolve {Mq, ..., bo, e) 

1: Input: Inverse approximated chain {Mq, Mi, ..., M^}, bo, and e. 
2: Output: e close approximation, x, of x* 

3: Initialize: uq = 0; 

4; X = ParallelRSolve {Mq, Mi, ..., M^, bo) (he., Algorithm [T]) 

5; for /c = 1 to 5 do 
6 : = Moyk-i 

7: wf ^ = ParallelRSolve [Mq, Mi,..., M^, 

( 2 ) 

8 : Vk = Vk-l -Ul' +X 

end for 


9; x = yq 
10: return x 




Algorithm 3 DistrRSolve^ {[Mo]fci,.. 



Part One: Computing [bi]k 

[bl]fc [boj/c “h [AqDq %j[bo]j 

for i = 2 to d do 


for 

j : Vj G N2i-i 

[vk) do 





iAoD^Y~\ 

_ [Do]rr 

kj L-ir=l [Doljj 

{AqD^Y'\ 

kr 

iAQD^Y~\ 

end 

for 






j jr 


[bi]k — [bi-i]k + Ylj-.vjGN i_i{vk) 


(AoD^Y 


end for 


J kj 


[bi- 


iJi 


Part T^vo: Computing [xq]i. 
[Xd]k = 

for i = d — 1 to 1 do 


for j : Vj € N2i(r’fc) do 


'{DQ^AQf 

kj 

_ [Do]jj 

^r=l [Do\rr 

'{D^^AQr-\ 

kr 

'{D^^Aor-^ 

for 







[Xi\k = 

end for 


2 [£> 0 ]*: 


+ 


fc +1 

2 


+ IE 


2 Z-^j:Vje'N^i{vk) 


(On"'An 


J kj 


[Xi+l\] 


[®o]fc = 2i^ + ^ + I ^Ao]kj[xi]j 

return: [a3o]fc 


4.1. “Crude” Distributed SDDM Solver 

Starting from Mq = Dq — Aq , consider the collection C = {Ao, Oo, Ai, Di,..., Ad, Dd}, where Dk = Dq, 
and A]^ = Dq (^D^Aq) , for k = d} with Dq = Dq^ and Aq = Aq. Since the magnitude of the 

eigenvalues of O^^'An is strictly less than 1 , [Dq^Aq) tends to zero as k increases which reduces the length 
of the chain needed for the distributed solver. It is easy to verify that C is an inverse approximated chain, 
since: ( 1 ) Di - Ai Di_i - Ai_iOEiAi_i with e* = 0 for i = 1 ,... , d, ( 2 ) Di Di_i with = 0 

for i = 1,..., d, and (3) Dd Dd — Ad- Using the above. Algorithm [3] (our first contribution) describes the 
distributed version of the “crude” parallel solver, which returns the component of the approximate solution 
vector, [xQ]k. Each node, Vk € V, receives the k^^ row of Mq, the k^^ value of bo (i.e., [bo]fc), and the length 
of the inverse approximated chain d as inputs. It operates in two parts. Part One and Part Two. The first, 
computes the k^^ component of bi, \bi]k for i G {1,..., d}, which is then used in Part Two to return [aio]fc- 
Analysis of Algorithm [3l Next, we present the theoretical analysis, showing that DistrRSolve computes 
the component of the “crude” approximation of x*. Further, we provide the time complexity analysis. 

Lemma 7 . Let Mq = Dq — Aq he the standard splitting of Mq. Let Z'q he the operator defined by 
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Algorithm 4 DistrESolve ({[Mpjfci, ■ ■ ■, [Mo\kn}, [bo]k,d, e) _ 

Initialize: [yo]k = 0; [x]k = DistrRSolve ({[Mo]fci, • • •, [Mo]kn}, [bo]k,d) (i.e., Algorithm [3]) 
for i = 1 to g do 


u 


u 


-I k 


- k 


= [Do] kk [yt— i]a: [^4—l] j 

= DistrRSolve({[Mo]fci,..., [Mo]kn}, \u, 


( 1 ) 


J k ' 


,d,] 


[yt]k = [yt-i]k - 

end for 

[®]fc [yq]k 
return [x]k 



+ [x]k 


DistrRSolve{[{[Mo]ki ,..., [Mo]kn}, [&o]fc, d) (i.e., xq = Z'^bo). Then 

Z'o 

Moreover, Algorithmic requires O [dn‘^^ time steps. 

Proof. See Appendix. 

□ 


4 . 2 . “Exact” Distributed SDDM Solver 

Having introduced DistrRSolve, we are now ready to present a distributed version of Algorithm [2] which 
enables the computation of e close solutions for Mqx = hg- Similar to DistrRSolve, each node G V receives 
the row of Mq, [holfc; d and a precision parameter e as inputs. Node Vk then computes the component 
of the e close approximation of x*. 

Analysis of Algorithm [4t The following lemma shows that DistrESolve computes the k^^ component of 
the e close approximation of x* and provides the time complexity analysis 

Lemma 8 . Let Mq = Dq — Aq be the standard splitting. Further, let Cd < ^ln2 in the nverse approxi¬ 
mated chain C = {AQ,DQ,Ai,Di,...,Ad,Dd}. Then DistrESolve{{[MQ\ki,... ,[MQ\kn}-,\bo\k-,d,e) requires 
O (log i) iterations to return the k^^ component of the e close approximation forx*. 

Proof. See Appendix. □ 

The following lemma provides the time complexity analysis of DistrESolve: 

Lemma 9. Let Mq = Dq — Aq be the standard splitting. Further, let Cd < 3 In 2 in the inverse approxi¬ 
mated chain C = {AQ,DQ,Ai,Di,...,Ad,Dd}. Then, DistrESolve{{\MQ\ki, ■ ■ ■ ,[MQ]kn}-,\bQ\k-id,e) requires 
O ((in^log(i)) time steps. 

Proof. See Appendix. □ 

4 . 3 . Length of the Inverse Chain 

Both introduced algorithms depend on the length of the inverse approximated chain, d. Here, we provide 
an analysis to determine the value of d which guarantees < | ln2 in C = {Aq, Dq, A\, Di ,..., Ad, Dd}: 

Lemma 10. Let Mq = Dq — Aq be the standard splitting and k denote the condition number of Mq. Consider 
the inverse approximated chain C = {Aq, Dq, Ai, Di,..., Ad, Dd} with a length d = [log ^21n 

then Dq Dq — Dq {^DqAq^ , with Cd < | In 2. 

Proof. The proof will be given as a collection of claims: 

Claim: Let k be the condition number of Mq = Dq — Aq, and {\i}f^-^ denote the eigenvalues of Dq^Aq. Then, 
|Aj| < 1 — i, for alH = 1,..., n 

Proof. See Appendix. □ 
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Notice that if A* represented an eigenvalue of then is an eigenvalue of (Dq ^AqY for all r € N. 

Therefore, we have 




( 7 ) 


Claim: Let M be an SDDM matrix and consider the splitting M = D — A, with D being non negative 
diagonal and A being symmetric non negative. Further, assume that the eigenvalues of D~^A lie between —a 
and fj. Then, (1 — /3)iA ^ D — A ^ (1 + a)D. 


Proof. See Appendix. 

Combining the above results, give 1 — (l — 


2 “* 


Dd -< Dd—Ad ^ 


1 + 


2 “^ 


□ 


Dd Hence, to guarantee 

1x2'^ 


that Dd Dd — Ad, the following system must be satished: (1) e < 1 — (l — i) , and ( 2 ) > 

1+ (l — . Introducing 7 for (l — , we arrive at: (1) > In and (2) > ln(l + 7 ). Hence, > 

{in (t^) ,ln(l + 7 )} = In (i^)- Now, notice that if d = [logcfi] then, 7 = (l - i)^ = (l “ 7 ) 


max 


Hence 


— ’ 


> In ( 1 ^) < In This gives c = [21n (- 3 |^)l, implying = In (^) < | 


In 2 . 


□ 


Using the above results the time complexity of DistrESolve with d = [log {2 In { 1 } 
O (n^ logKlog(i)) times steps, which concludes the proof of Theorem [TJ 


5. Distributed R-Hop SDDM Solver 

Though the previous algorithm requires no knowledge of the graph’s topology, but it requires the informa¬ 
tion of all other nodes (i.e., full communication). We will outline an R-Hop version of the algorithm in which 
communication is restricted to the R-Hop neighborhood between nodes. The following theorem summarizes 
these main results: 


Theorem 2. There is a decentralized algorithm A({[Afo]fci,... [AToJfcn}) [ho]fc,R, e), that uses only R-Hop com¬ 
munication between the nodes and computes e-close solutions to Mqx = bo in O ((^ + aRdmax) log( 7 )) time 
steps, with n being the number of nodes in G, dmax denoting the maximal degree, k the condition number of 

Mq, and a = min l^n, representing the upper bound on the size of the R-hop neighborhood Vu G V, 

and e G (0, being the precision parameter. 

Given a graph G formed from the weighted Laplacian Mq, the following corollary easily follows: 


Corollary 1. Let Mq be the weighted Laplacian of G = (V, E,VU). There exists a decentralized algo¬ 
rithm that uses only R-hop communication between nodes and computes e close solutions of Mqx = 60 

O time steps, with n being the number of nodes in G, Wmax,Wmin denoting the largest and 

the smallest weights of edges in G, respectively, a = min |n, | representing the upper bound on the 

size of the R-hop neighborhood Vu G V, and e G (0, being the precision parameter. 


5.1. “Crude” R-Hop SDMM Solver 

Algorithm [S] presents the “crude” R-Hop solver for SDDM systems using the same inverse chain of Sec¬ 
tion ICT Each node G V receives the row of Mq , component, [foo]fc of bQ, the length of the inverse 
chain, d, and the local communication bouncfl R as inputs, and outputs the component of the “rude” 
approximation of x*. 

Analysis of Algorithm [5] The following Lemma shows that RDistRSolve computes the k^^ component 
of the “crude” approximation of x* and provides the algorithm’s time complexity 


■^For simplicity, R is assumed to be in the order of powers of 2, i.e., R = 2^. 
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Algorithm 5 RDistRSolve ({[Mo]fei, • • •, [Mo]kn}, [bo]k,d, R) 


Part One: 

..., = {{g^...., {[D„-'A„]„,..., [D„-'A„|,4 = { 1^,,,,, 1^) 


[C'olfci, • • • 5 [Co]kn — Compo ([Mo]fci,..., [Moj^n, R), [Ci]ki, • • •, [Ci]kn — Comp]^ , [Mo]fc„, R) 


Part Two: 
for i = 1 to d do 

if i — 1 < p 

[u^r\ = [AoDQ\_i]k 
for j = 2 to 2* ^ do 

= [AoDQ^u^jZi^]k 

end for 

[bi]k = [bi-i]k + [u^2i^l^]k 
it i — 1 > p 
k-l = 2-Vr 

[■^1* = [C'o&i-i]fc 

for j = 2 to li-i do 

= \Cov^:^]k 

end for 

[bi]k = 

end for 


Part Three: 

[Xd]k = MVl-DoJfcfc 

for i = d — 1 to 1 do 


do 

[Vr'>]k = [D~^Aovf^S 


[Mt + 


if i < p 

= [D^^AoXi+ilk 
for j = 2 to 2' 

^^(*+i)i, _ rn~i /i„^(*+i) 

end ^or 

2 

it i > p 
k = 2Vii 

= [CiXi+i]k 

for j = 2 to li do 

= [Ci>7R;'i>, 

end for 


i+ll 


[Xi]k — 2 

end for 


[oiik + 


i+ll 


\xo\k — 2 [Do]kk [“^O AoXi\k 

return [a;o]fc 


Algorithm 6 Compo {[Mo]ki,[Mo]kn, R) 


for ^ = 1 to ii — 1 do 

for j s.t.Vj G Ni+iivk) do 


en 


end for 
d for 


r:tirSNi(tij) 


lDo]r 

1-Doljj 


^f-[{AoD^^)%r[AoD, 

return cq = {[(Ao-D(7^)'^]fci,..., [(Ao-D(7^)'^]fcn} 


0 \r 
























Algorithm 7 Compi([Mo]fci, ■ ■ ■, [Mo]fcn,fi) 
for / = 1 to ii — 1 do 


for j s.t.Vj G Ni+i{vk) do 

[(Oo= i: ISteK^o 

r:UrGNi(uj) 


jr 


n 


d for 


ena tor 


return ci = {[(Dq ^Ao)\i ,..., [(Dq ^Ao)\ri} 


Lemma 11. Let Mq = Dq — Aq be the standard splitting and let Z'q he the operator defined by RDistRSolve, 
namely, xq = Z'^bQ. Then, Z'q RDistRSolve requires O (^a + aRdma^^, where a = 

. f (dmti-i) 1 

mm < n, -Vi- tS >, to arrive at xq. 

\^max — l) 

The proof of the above Lemma can be arrived at by proving a collection of claims: 

Claim: Matrices (Dq Aq) and [AqDq j have sparsity patterns corresponding to the i?-Hop neighborhood 
for any i? G N. 

Proof. The above claim is proved by induction on R. We start with the base case: for R = 1 

[Ao]i 


[AqDq ]ij — 


\n 


0\ii 


(if j : Vj G Ni(nj)) or [Aq-Dq ^]ij = 0 (otherwise) 


Therefore, Aq-Dq ^ has a sparsity pattern corresponding to the 1-Hop neighborhood. Assume that for all 
1 < p < -R — 1, (^A^Dq^Y has a sparsity pattern corresponding to the p — hop neighborhood. Consider, 

{AoDYY 


[{AoDfiYh 


Y,[iAoDYY-Yk[AoDY]kj 


( 8 ) 


k=l 


Since AqDq ^ is non negative, then [(AqDq 7 ^ 0 iff there exists k such that G Ni{_i(ui) and G 

Ni(uj), namely, vj G Ni{(nj). The proof can be done in a similar fashion for D^^Aq. □ 

The next claim provides complexity guarantees for CompQ and Comp]^ described in Algorithms [6] and [71 
respectively. 

Claim: Algorithms [6] and [7] use only the R-hop information to compute the row of (D^^Aq) and 
(AqDq^^)’^, respectively, in O (aRdmax) time steps, where a = min |n, |. 

Proof. The proof will be given for Compg described in Algorithm [6] as that for Comp^ can be performed 
similarly. Due to Claim 15.11 we have 


— 1\ 


(AoDo') 


J kj 




-ifi] 


1 rj 


(9) 


r=l 


r:'UrSNi(Dj) 

._ 1 \ Dl 


Therefore at iteration / -I- 1, computes the k^^ row of (AqDj^'^) using: (1) the k^^ row of {AqU^^Y, and 
(2) the column of AqD^ . Node v^, however, can only send the row of AqDq^ making AqDY 
non-symmetric. Noting that [-4o-Dy^]rj/[£)o]rr = Ao-Dq ^]jv/[Do]„- 3 , since Dj^'^AqD^^ is symmetric, leads to 

)^]fcr[AoDg ^]jr. To prove the time complexity guarantee, at each 

iteration vj. computes at most a values, where a = min l^n, is the upper bound on the size of the 

R-hop neighborhood Mv G V. Each such computation requires at most 0{dmax) operations. Thus, the overall 
time complexity is given by O{aRdmax)- □ 
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We are now ready to provide the proof of Lemma [TTl 


Proof. From Parts Two and Three of Algorithm[5j it is clear that node computes [hi]fcj [b 2 ]k, ■ ■ ■, [bd]k and 
[xj\k, • • •, [®o]a:) respectively. These are determined using the inverse approximated chain as follows 


6 , = ( 7 + = bi_i + 


-1 


( 10 ) 


Xi = ^[D- ^bi + {I + D- ^Ai)xi+i] = ]^[Dq ^bi + Xi+i + {D^ ^Aof’'xi+i] 
Considering the computation of [6i]fc,..., [b^lk for p > z — 1, we have 


— [hj—i]fc + [(AqTDq 


-1\2* 


bi-i]k = [bi-i]k + [AoDq ^. AqDq ^ bi-ijk = ^ • ^o-D( 






— [bi-i]k + 


u. 


0 - 1 ) 


- k 


with = AqDq for j = 1,... 2* ^ — 1. 


-j+i 


u 


Since AqTIq ^ has a sparsity pattern corresponding to 1-hop neighborhood (see Claim [5T]) . node computes 

(i—l) 

, based on u) , acquired from its 1-Hop neighbors. It is easy to see that Vi such that i — 1 < p 


0 - 1 ) 

0+1 


the computation of [bi]k requires O (2* ^dmax) time steps. Thus, the computation of [hi]fc, ■ ■ ■, [bp]k requires 
0(21’dmax) = 0{RdmiLx)- Now, Consider the computation of [6*]^ but for i — 1 > P 

[bi]k = [bi-i]k + = [bi-i]k + [ Cq . . . Cq _bt-i]fc = + [ Op . . . Cq u^^ = [bi-i]fc -O 


0 - 1 ) 


“1 


with Op = (ApTDq li-i = and = Oprij* for j = l,...,^j_i — 1. Since Op has a spar¬ 
sity pattern corresponding to R-hop neighborhood (see Claim [5TT]) . node Vk computes based on the 

components of u) ’ attained from its R-hop neighbors. For each i such that i — 1 > p the computing 

[bi]k requires O time steps, where a = min|n, | being the upper bound on the number of 

nodes in the R— hop neighborhood V u G V. Therefore, the overall computation of [bp_|_ 2 ]fc, • • • , \bd\k 

is achieved inO time steps. Finally, the time complexity for the computation of all of the values 

[6i]a:, [^ 2 ]^) ■ ■ ■) [bd\k is O + Rdmax^ ■ Similar analysis can be applied to determine the computational com¬ 
plexity of [x^jk, [xd-i]k, • • •) [®i]fc) i-e-) Part Three of Algorithm[5j Using LemmaO we arrive at Z'q ^ 0 ^ ■ 
Finally, using Claim [5Tl the time complexity of RDistRSolve (Algorithmic]) is O ( + aRdmaxj ■ d 


5.2. “Exact” Distributed R-Hop SDDM Solver 

Having developed an R-hop version which computes a “rude” approximation to the solution of Mqx = bo, 
now we provide an exact R-hop solver presented in Algorithm [8| Similar to RDistRSolve, each node Vk receives 
the row Mq, [bo]fc) d, R, and a precision parameter e as inputs, and outputs the component of the e 
close approximation of vector x*. 


Algorithm SEDistRSolve ({[Mojfci,..., [Mo]kn}, [ho]fc, d, R, e) _ 

Initialize: [yo]k = 0, and [x]k = RDistRSolve({[Mo]fci,..., [Mo]kn}, [bo]k,d,R) 
for t = 1 to g do 

[u[ = [Do]kk[yt-l]k — Ylj:vjeNi{vk)^-^^]kj[yt-l]j 

= RDistRSolve({[Mo]fci,..., [Mo]kn}, [u[^^]k,d,R) 

[yt]k = [yt-i]k - [u^^^]k + [x]k 

end for 

return [x]k = [yq]k 


Analysis of Algorithm [8} The following Lemma shows that EDistRSolve computes the k^^ component 
of the e close approximation to x* and provides the time complexity analysis. 
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Lemma 12. Let Mq = Dq — Aq be the standard splitting. Further, let €d < i/3ln2. Then Algorithmic requires 
O (log i) iterations to return the component of the e close approximation to x*. 

Proof. See Appendix. □ 

Next, the following Lemma provides the time complexity analysis of EDistRSolve. 

Lemma 13. Let Mq = Dq — Aq be the standard splitting and let < i/3ln2, then EDistRSolve requires 
O ((2‘^/rq; + aRdmax) log (Ye)) time steps. Moreover, for each node v^, EDistRSolve only uses information 
from the R-hop neighbors. 

Proof. See Appendix □ 


5.3. Length of the Inverse Chain 

Again these introduced algorithms depend on the length of the inverse approximated chain, d. Here, we pro¬ 
vide an analysis to determine the value of d which guarantees < | ln2 in C = {Aq, Dq, A\,Di ,..., A^, Du}- 
These results are summarized the following lemma 


Lemma 14. Let Mq = Dq — Aq be the standard splitting and let n denote the condition number Mq. Consider 
the inverse approximated chain C = {Aq, Dq, Ai, Hi, ..., A^, Dj] with length d = [log ^2 In ^ then 

Dq Dq - Dq (Hq ^Ao)^ ; with Cd < Vs In 2. 


Proof. The proof is similar to that of Section 14.31 and can be found in the Appendix. 


□ 


6. Discussion & Conclusions 


We developed a distributed version of the parallel SDDM solver of [1^ and proposed a fast decentral¬ 
ized solver for SDDM systems. Our approach is capable of acquiring e-close solutions for arbitrary e in 

Q log (e)^, with n the number of nodes in graph G, Wmax and Wmin denoting the largest and 

smaller weights of the edges in G, respectively, a = min < n, [ representing the upper bound on the 

size of the R-Hop neighborhood Vu G V, and e G (0, as the precision parameter. After developing the full 
communication version, we proposed a generalization to the R-Hop case where communication is restricted. 

Our method is faster than state-of-the-art methods for iteratively solving linear systems. Typical linear 
methods, such as Jacobi iteration Q], are guaranteed to converge if the matrix is strictly diagonally dominant. 
We proposed a distributed algorithm that generalizes this setting, where it is guaranteed to converge in the 
SDD/SDDM scenario. Furthermore, the time complexity of linear techniques is logn), hence, a case 

of strictly diagonally dominant matrix Mq can be easily constructed to lead to a complexity of O(n^logn). 
Consequently, our approach not only generalizes the assumptions made by linear methods, but is also faster 

by a factor of log n. _ _ 

In centralized solvers, nonlinear methods (e.g., conjugate gradient descent [ll|, [l^, etc.) typically offer 
computational advantages over linear methods (e.g., Jacobi Iteration) for iteratively solving linear systems. 
These techniques, however, can not be easily decentralized. For instance, the stopping criteria for nonlinear 
methods require the computation of weighted norms of residuals (e.g., ||pfc||Mo with being the search 
direction at iteration k). To the best of our knowledge, the distributed computation of weighted norms is 
difficult. Namely using the approach in [I^, this requires the calculation of the top singular value of Mq 
which amounts to a power iteration on MJMq leading to the loss of sparsity. Furthermore, conjugate gradient 
methods require global computations of inner products. 

Another existing method which we compare our results to is the recent work of the authors [l6|] where a 
local and asynchronous solution for solving systems of linear equations is considered. In their work, the authors 

derive a complexity bound, for one component of the solution vector, of O ^min ^ ^ with e 

being the precision parameter, d a constant bound on the maximal degree of G, and G is defined as a; = Gx + z 
which can be directly mapped to Aa; = b. The relevant scenario to our work is when A is PSD and G is 




/ / «;(A) + 1 . 

symmetric. Here, the bound on the number of multiplications is given by O I min Ida “ 
with k(A) being the condition number of A. In the general case, when the degree depends on the number of 
nodes (i.e., d = d{n)), the minimum in the above bound will be the result of the second term ( In i) 

leading to O (d(n)nfi;(A) In . Consequently, in such a general setting, our approach outperforms [I^ by a 
factor of d{n). 
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Appendix 


In this appendix, we will provide proof of the Lemmas in the original submission. 

Lemma 1. Let Zq ® = ■^o^o- Then x is — 1) approximate solution of Mqx = &o- 

Proof. Let x* G M” be the solution of Mqx = bo, then 

II®* “ ®IImo ~ (®* “ x)'^Mq{x* — x) = {x*)'^M qX* + {xf MqX — 2{x*)'^M qX (11) 

Consider each term separately in (1111) : 

1 . {x*)'^Mqx = ^MoZobo = b^ Z^bo 

2 . {x*fMox* = blM^^MoM^^b^ = b^M^% < e^b^Zobo 

3. ]\Tqx = b'^ZoMioZ^bf) < e^b'^Z^b^ 

where in the last step we used that if Zq i then Mq 

Therefore, m can be rewritten as: 

II®*-®IImo ^2(e^-l)b^Zo6o (12) 

Combining (fT^ with b'^Z^b^ = (x*)'^M qZqMqx* < e'^(x*)^M qx*: 

II®* - ®IImo - “ l)e^(a;*)'^M'o£c* = 2(e^ - l)e^||a;*||^^ 

□ 


Lemma 2. Let Mq = Dq — Aq be the standard splitting of Mq. Let Z'q he the operator defined by 
DistrRSolve{[{[Mo]ki,[Mo]kn}, [&o]fc, d) (i.e., xq = Z'^bo). Then 

Z'o Mq-i 

Moreover, Algorithmic requires O (dn^^ time steps. 

Proof. The proof commences by showing that (Uq Aq) and (AoDg ) have a sparsity pattern correspond¬ 
ing to the r-hop neighborhood for any r G N. This case be shown using induction as follows 

1. Base case: If r = 1, we have 


[AqDq ]ij 


Ao\ij 

[Do] a 
0 


if j : Vj G Ni(ui), 
otherwise . 


Therefore, AqDq^ has sparsity pattern corresponding to the 1-hop neighborhood. 

Assume that for all 1 < p < r — 1, {AqDq^)p has a sparsity patter corresponding to the p-hop neighborhood. 
2. Now, consider {AqDq^Y, where 


[{AoDY^Y]^J = Y,[iAoDY^Y-%[AoDY%j (13) 

k=l 

Since AqD^^ is non negative then it is easy to see that [{AQDY^Y\ij / 0 if and only if there exists k 
such that Vk G Nr-i(uj) and Vk G Ni{vj) (i.e., Vj G Nr(ui)). 
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For the same results can be derived similarly. 

Please notice that in Part One of DistrRSolve algorithm node vj. computes (in a distributed fashion) the 
components [&i]fc to [bd\k using the inverse approximated chain C = {Aq, Dq, Ai,Di ,..., Ad, Dd}. Formally, 


bi = 


I + {AoD^Y~'] b,_i = bi_i + {AoD^Y~" 






bi- 


'i-l 


2^—^ 

Clearly, at the iteration node requires the row of {^AqDq ) (i-e., the row from the previous 

iteration) in addition to the row of from all nodes Vj G N 2 i-i (u^) to compute the k^^ row of 

1 oi—I 


For computing 


[AqDq^)^ , node Vk requires the k^^ row and column of [AqDq 


n2* 


The 


J kj 


problem, however, is that node Vj can only send the row of [AqDq ^)^ which can be easily seen not to 


2* - 

be see that symmetric. To overcome this issue, node Vk has to compute the j*" column of [AqDq ) based 


on its row. The fact that Dq ^ (^AqDq ^)^ is symmetric, manifests that for r = I,... , n 


ii2‘ 




J rj 


li2‘ 




J jr 


D, 


0 Jrr 


[Do]jj 


Hence, for all r = I,..., n 




[Do], 


J rj 


\D, 


oJij 




(^oDo') 


J jr 


(14) 


Now, lets analyze the time complexity of computing components [6i]fc, [^ 2 ]^) ■ ■ ■ j \bd\k- 


Time Complexity Analysis: At each iteration z, node receives the row of {^AqDq ) from 
all nodes Vj £ N 2 i-i{vk)- using Equation [TH node Vk computes the corresponding columns as well as the 

product of these columns with the row of (Aq-D^ ) . Therefore, the time complexity at the iteration 

is O (n^ + diam (G)), where is responsible for the row computation, and diam (G) represents the 
communication cost between the nodes. Using the fact that diam (G) < n, the total complexity of Part One 
in DistrRSolve algorithm is O [dn?). 

In Part Two, node Vk computes (in a distributed fashion) [xd-i]k, [xd- 2 ]k, ■ ■ ■, [xo]k using the same inverse 
approximated chain C = {Aq, Dq, Ai, Di, ..., Ad, Dd}. 


I I r ii I I 

+ ^{DQ^AQf~" {DQ^AQf~"x,+, 


(15) 


for i = d — I,..., I. Thus, 

= 2^0 ^^0 + Y + 2 xi 

Similar to the analysis of Part One of DistrRSolve algorithm the time complexity of Part Two as well as 
the time complexity of the whole algorithm is O [dn?). 

Finally, using Lemma [5] of the original paper for the inverse approximated chains C = 
{Aq, Dq, Ai,Di, ...,Ad, Dd] yields; 


Mq\ 

□ 

Lemma 3. Let Mq = Dq — Aq be the standard splitting. Further, let < |ln2 in the nverse approxi¬ 
mated chain C = {AQ,DQ,Ai,Di,...,Ad,Dd}. Then DistrESolve{{[MQ\ki,... ,[MQ\kn},\bo\k,d,e) requires 
O (log i) iterations to return the k^^ component of the e close approximation for x*. 
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Proof. Notice that iterations in DistrESolve corresponds to Preconditioned Richardson Iteration: 

yt = [l — ZqMq\ yt-i + Zobo 

where Zq is the operator defined by DistrRSolve and yo = 0- Therefore, from Lemma [2) 

Finally, applying Lemma[6]of the main submission gives that DistrESolve algorithm needs O (log iterations 
to component of the e approximated solution for x*. □ 

Lemma 4. Let Mq = Dq — Aq he the standard splitting. Further, let e^ < | In 2 in the inverse approxi¬ 
mated chain C = {Aq,Dq,Ai,Di,...,A( 1 ,D(i]. Then, DistrESolve{{\]V[Q\ki-, ■ ■ ■ A^o]kn}-,\hQ\kid,e) requires 
O ((in^log(i)) time steps. 

Proof. Each iteration of DistrESolve algorithm calls DistRSolve routine, therefore, using Lemmas [2] and [3] the 
total time complexity of f DistrESolve algorithm is O i^dn^ log(^)) time steps 

□ 

Claim: Let k be the condition number of Mq = Dq — Aq, and denote the eigenvalues of Dq^Aq. Then, 

|Ai| < 1 — i, for alH = 1,..., n 

Proof. See Proposition 5.3 in [^. □ 

Claim: Let M be an SDDM matrix and consider the splitting M = D — A, with D being non negative 
diagonal and A being symmetric non negative. Further, assume that the eigenvalues of D~^A lie between —a 
and /3. Then, (1 — /3)D ^ D — A ^ (1 + a)D. 

Proof. See Proposition 5.4 in (2^. □ 

Lemma 5. Let Mq = Dq — Aq be the standard splitting. Further, let < 1 / 3 In2. Then Algorithmic requires 
O (log iterations to return the component of the e close approximation to x*. 

Proof. Please note that the iterations of EDistRSolve correspond to a distributed version of the preconditioned 
Richardson iteration scheme 

yt = [I — Z'^Molyt-i + Z'obQ 

with yo = 0 and Z'q being the operator defined by RDistRSolve. From Lemma fTTl it is clear that Z'q ^q^ ■ 
Applying Lemma [U provides that EDistRSolve requires O (log i/e) iterations to return the k^^ component of 
the e close approximation to x*. Finally, since EDistRSolve uses procedure RDistRSolve as a subroutine, it 
follows that for each node only communication between the R-hope neighbors is allowed. □ 

Lemma 6. Let Mq = Dq — Aq he the standard splitting and let Cd < i/3ln2, then EDistRSolve requires 
O ((2‘^/rq; + oRdmax) log (i/e)) time steps. Moreover, for each node v^, EDistRSolve only uses information 
from the R-hop neighbors. 

Proof. Notice that at each iteration EDistRSolve calls RDistRSolve as a subroutine, therefore, for each 
node Vk only R-hop communication is allowed. Lemma [11] gives that the time complexity of each 

iteration is O + aRdmax^, and using Lemma [5] immediately gives that the time complexity of 

C)((27Ra-FaRdmax)log(V0)- ^ 

Lemma 7. Let Mq = Dq — Aq be the standard splitting and let k denote the condition number Mq. Consider 
the inverse approximated chain C = {Aq, Dq, Ai, Di,..., Ad, Dd} with length d = [log ^2In ^ Ihen 

Dq Dq - Dq (Dq ^Aq)^ , with Cd < ^/zhi2. 

Proof. The proof takes similar steps as in Lemma [TH □ 
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